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Abstract 

We address the statistical estimation of composite functionals which may be nonlinear in 
the probability measure. Our study is motivated by the need to estimate coherent measures 
of risk, which become increasingly popular in finance, insurance, and other areas associated 
with optimization under uncertainty and risk. We establish central limit formulae for composite 
risk functionals. Furthermore, we discuss the asymptotic behavior of optimization problems 
whose objectives are composite risk functionals and we establish a central limit formula of 
their optimal values when an estimator of the risk functional is used. While the mathematical 
structures accommodate commonly used coherent measures of risk, they have more general 
character, which may be of independent interest. 
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1 Introduction 

Increased interest in the analysis of coherent measures is motivated by their application as mathe¬ 
matical models of risk quantification in finance and other areas. This line of research leads to new 
mathematical problems in convex analysis, optimization and statistics. The uncertainty in risk as¬ 
sessment is expressed mathematically as a functional of random variable, which may be nonlinear 
with respect to the probability measure. Most frequently, the risk measures of interest in practice 
arise when we evaluate gains or losses depending on the choice z, which represents the control of 
a decision maker and random quantities, which may be summarized in a random vector X. More 
precisely, we are interested in the functional f{z,X), which may be optimized under practically 
relevant restrictions on the decisions z. Most frequently, some moments of the random variable 
Y = f{z, X) are evaluated. However, when models of risk are used, the existing theory of statistical 
estimation is not always applicable. 

Our goal is to address the question of statistical estimation of composite functionals depending 
on random vectors and their moments. Additionally, we analyse the optimal values of such func¬ 
tionals, when they depend on finite-dimensional decisions within a deterministic compact set. The 
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known coherent measures of risk can be cast in the structures considered here and we shall special¬ 
ize our results to several classes of popular risk measures. We emphasize however, that the results 
address composite functionals of more general structure with a potentially wider applicability. 

Axiomatic definition of risk measures was first proposed in [18]. The currently accepted defini¬ 
tion of a coherent risk measure was introduced in [T] for finite probability spaces and was further 
extended to more general spaces in (HU [13]. Given a probability space (17, T^ P), we consider the set 
of random variables, defined on it, which have finite p-th moments and denote it by £^(17, P,P). A 
coherent measure of risk is a convex, monotonically increasing, and positively homogeneous func¬ 
tional Q : D’{Q,P,P) —)• R, which satisfies the translation equivariant property q{Y + a) = Q{Y) + a 
for all a G M. Here .R = MU {-|-oo} and we assume that Y represent losses, i.e., smaller realizations 
are preferred. Related concepts are introduced in [3I1II2]. 

A measure of risk is called law-invariant, if it depends only on the distribution of the ran¬ 
dom variable, i.e., if q{X) = q{Y) for all random variables X,Y G U’{fi,P,P) having the same 
distribution. 

A practically relevant law-invariant coherent measure of risk is the mean-semideviation of order 
p>l (see [211125], [36l s. 6.2.2]), defined in the following way: 


f?(X) =E[A] + k||(X-E[A])+ 


E[A]-bK E[(max{0, A-E[A]})^ 


( 1 ) 


where k G [0,1]. Note the nonlinearity with respect to the probability measure in formula ([T]). 

Another popular law-invariant coherent measure of risk is the Average Value at Risk at level 
a G (0, 1] (see [301 US]), which is defined as follows: 

AVaR„(A) = -r Ff^\/3) dp = min { r, +-E[{X -T]) A. (2) 

a Ji_a 7?eiR [ a J 


Here, Fxi') denotes the distribution function of X. The reader may consult, for example, [36l 
Chapter 6] and the references therein, for more detailed discussion of these risk measures and their 
representation. 

The risk measure AVaRQ(-) plays a fundamental role as a building block in the description of 
every law-invariant coherent risk measure via the Kusuoka representation. The original result is 
presented in |2nj for risk measures defined on £°°(f7,F,P), with an atomless probability space. It 
states that for every law-invariant coherent risk measure q{-), a convex set A4 C R(0,1] exists such 
that for all Z G £°°(f7,F, P), it holds 


e{x) 


sup / AVaRQ,(A) m{da). 
m^M Jo 


(3) 


Here V{0, 1] denotes the set of probability measures on the interval (0, Ij. This result is extended 
to the setting of spaces with p G [1, oo); see m, m, [28], [36], m, and the references therein. 

The extremal representation of AVaRa(A) on the right hand side of ([2]) was used as a motivation 
in m to propose the following higher-moment coherent measures of risk: 

£-(A) = min jr/-h-II (A-?])+lip I, p > 1. (4) 

t;GR [a J 

These risk measures are special cases of a more general family considered in [7]; they are also exam¬ 
ples of optimized certainty equivalents of [3]. In the paper [9], the explicit Kusuoka representation 


2 




for the higher-order risk measures ([!]) was described by utilising duality theorems from |29]. These 
risk measures are used for portfolio optimization in m, where their advantages in in comparison 
to the classical mean-variance optimization model of Markowitz (ED [22]) is demonstrated on ex¬ 
amples. The recent work [23| indicates that if such type of risk measure is used as a risk criterion in 
European option portfolio optimization, the time evolution of the portfolio is superior to the evo¬ 
lution of a portfolio optimized with respect to the AVaR risk or with respect to the mean-variance 
optimization model of Markowitz. Similar observations were recently made in |15j . 

A connection of measures of risk to the utility theories is discussed in the literature. Many of 
the risk measures of interest can be expressed via optimization of the so-called optimized certainty 
equivalent [3| for a suitable choice of the utility function. Relations of risk measures to rank- 
dependent utility functions are given in m- In [To], it is established that coherent measures of 
risk are a numerical representation of certain preference relation defined on the space of bounded 
quantile functions. 

In practical applications, we deal with samples and stochastic models of the underlying random 
quantities. Therefore, the questions pertaining to statistical estimation of the measures of risk are 
crucial to the proper use of law-invariant measures of risk. Several measures of risk have an explicit 
formula, which can be used as a plug-in estimator, with the original measure P replaced by the 
empirical measure. The empirical quantile is a natural estimator of the Value at Risk. A natural 
empirical estimator of AVaRQ,(A) leads to the use of the L-statistic (see [161 [8]i. Furthermore, the 
Kusuoka representation, as well as the use of distortion functions in insurance has motivated the 
construction and analysis of empirical estimates of spectral measures of risk using L-statistic. We 
refer to [MiEKniiiiEais] for more details on this approach. Some risk measures, such as the tail 
risk measures of form (|T|), cannot be estimated via simple explicit formulae but are obtained as a 
solution of a convex optimization problem with convex constraints. Although asymptotic behavior 
of optimal values of sample-based expected value models has been investigated before (see |32l Ch. 
8], |36l Ch. 5] and the references therein), the existing results do not address models with risk 
measures. 

Our paper is organized as follows. Section [2] contains the key result of our paper, which es¬ 
tablishes a central limit formula for a composite risk functional. We provide a characterization 
of the limiting distribution of the empirical estimators for such functionals. Section jS] contains a 
central limit formula for risk functionals, which are obtained as a the optimal value of composite 
functionals. Sectional provides asymptotic analysis and central limit formulae for the optimal value 
of optimization problems which use measures of risk in their objective functions. We pay special 
attention to some popular measures and we discuss several illustrative examples in Sections 12131 and 
m In Section 5, we perform a simple simulation study to assess the accuracy of our approximations. 
Section 6 concludes. 


2 Estimation of composite risk functionals 


In the first part of our paper, we focus on functionals of the following form: 


giX) = E 


fi 


(E[/2(E[.-./fc(E[/fc+i(A)],A)] 




where X is an m-dimensional random vector, fj : IR"*^ x M™' ^ j = 1,... ,k, with thq = 1 

and fk+i ■ M"* ^ Let A C M™' be the domain of the random variable X. We denote the 

probability distribution of A by P. 
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Given a sample Xi,..., Xn of independent identically distributed observations, we consider the 
following plug-in empirical estimate of the value of g: 




1 


E ^ [A ( E ;; [/2 ( E ■ ■ ■ A( E ;; ). )l 


20 = 1 


21 = 1 


22 = 1 


2fc — 1 


*0 


Our construction is motivated by the aim to estimate coherent measures of risk from the family 
of mean-semideviations mm)- 

Example 2.1 (Semideviations). Consider the functional © representing the mean-semideviation 
of order p > 1. In this case, we have k = 2,m = 1, and 

fi{Vi,x) = x + Kr]f, 

f2{V2,x)= [max{ 0 ,x - 772}]^, 

fsix) = X. A 


In order to formulate the main theorem of this section, we introduce several relevant quantities. 
We define: 


fjiVj)= fj{Vj,x) P{dx), j = l,...,k, 

Jx 

hk+i= / fk+i{x)P{dx), 

Jx 

Suppose Ij be compact subsets of such that pij+i G int(Ij), j = 1,... ,/c. We introduce the 
notation % = Ci(Ii) x Cmiih) x • • -Cmk-ii^k) x where Cmj_-,{Ij) is the space of continuous 
functions on Ij with values in M^^-i equipped with the usually supremum norm. The space M™'' 
is equipped with the Euclidean norm and % is assumed equipped with the product norm. We use 
Hadamard directional derivatives of the functions fj(^-,x) at points pij+i in directions Ci+i) i- e-i 

f'j (/ij+i, x; Ci+i) = lim ^ [fj (/ij+i -1 ts, x) - fj (/ij+i, x)]. 

s->-Cj + l 

For every direction d = {di,... ,dk, dk+i) GP, we define recursively the sequence of vectors: 


f.k+i{d) = dk+i, 

^jid)= fj{fij+i,x-,^j+i{d)) P{dx)+dj{jij+i), j = k,k-l,...,l. 

Jx 

Theorem 2.2. Suppose the following conditions are satisfied: 

('^) f \\fji'nj,x)\\‘^ P{dx) < 00 for all gj G Ij, and f ||/fc+i(x)|pP(dx) < 00 ; 
fiij For all x G X, the functions fj{-,x), j = 1,... ,k, are Lipschitz continuous: 

\\fjiVj,x) - fjiVj, x)|| < Jj{x)yj - Tj'jW, V r/', r?" G Ij, 

and f 7 |(x) P{dx) < 00 . 
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(in) For all x G X, the functions fj(-, x), j = 1,, k, are Hadamard directionally differentiable. 

Then 

where Wf) = , VFfc(-), W^+i) is a zero-mean Brownian process on I = Ii x I 2 x ■ ■ ■ x 1^. 

Here Wj(-) is a Brownian process of dimension mj-i on Ij, j = l,...,k, and Wk+i is an mk- 
dimensional normal vector. The covariance function of W has the following form: 

cov [Wi{rii), Wj{r]j)] = [ [/j(r?i,x) - Mvi)] [fj{vj,x) - fj{vj)]^ P{dx), 

J X 

rji G li, rjj G Ij, i,j = l,...,k, 

cov[Wi{r]i),Wk+i] = [ [fiir]i,x) - fiir]i)][fk+i{x) - pk+i]~^ P{dx), (6) 

J X 

r]iG li, i = l,...,k, 

cov \Wk+i, Wk+i] = / [fk+iix) - Hk+i] [fk+i{x) - Hk+i] P{dx). 

Jx 

Proof. We define I = Ii x I 2 x ■■■ x Ik, M = mo + mi + • • • + m^, and the vector-valued function 
f : I X X ^ with block coordinates fj{r]j,x), j = 1,... ,k, and fk+i{x). Similarly, we define 
f : I ^ with block coordinates fj{rij), j = 1,... ,k, and pk+i- Consider the empirical estimates 
of the function /(r/): 

1 ” 

h(^)(^r]) =n = l,2,.... 

2 = 1 

Due to assumptions (i)-(ii), all functions hf^'l are elements of the space H. 

Furthermore, assumptions (i)-(ii) guarantee that the class of functions f{r], •), 77 G /, is Donsker, 
that is, the following uniform Central Limit Theorem holds (see |38l Ex. 19.7]): 

v/S{/.l”> -f)X^W, (7) 

where VF is a zero-mean Brownian process on I with covariance function 

cov [lF(r 7 '),fF(? 7 ")] = / [/(r/', x) -/(r/')] [/(??", x) -/(r/")] E(dx). (8) 

Jx 

This fact will allow us to establish asymptotic properties of the sequence { }. 

First, we define a subset H ofH containing all elements {hi,..., hk, hk+i) for which hj+i{hj^ 2 i- • • hk{hk+i) • • •)) 
Ij, j = 1,... ,k. We define an operator T : FT —?• M as follows 

tF(/i) =/ii^/i2( ■■■hk{hk+i) •••))• 

By construction the value of g{X) is equal to the value of I'{f) and the value of is equal to 
the value of 

To derive the limit properties of the sequence we shall use Delta Theorem (see, [33]). 

The essence of applying the theorem is in identifying conditions under which a statement about 
a limit result related to convergence in distribution of a scaled version of a statistic h^'^\ can be 
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translated into a statement about a convergence in distribution of a scaled version of a transformed 
statistic 

To this end, we have to verify Hadamard directional differentiability of '!'{■) at /. 

Observe that the point / is an element of H, because G j = 1,... ,k. Moreover, 

due to assumption (ii), the following inequality is true for every j = 1,..., k: 

II (^J+l (^i+2 (' ' ' ■ ■ ■ ))) k'jW 

< \\hj - fjW + \\fj{hj+i{hj+2{- ■ ■ hk{hk+i) •••)))- 

< \\hj - fjW + j 7jix) P{dx) ■ \\hj+i{hj+ 2 {- ■ ■ hk{hk+i) ■■■))- 

Recursive application of this inequality demonstrates that / is an interior point of H. Therefore, 
the quotients appearing in the definition of the Hadamard directional derivative are well defined. 

Conditions (ii) and (iii) imply that the functions /(•) and are also Hadamard directionally 

differentiable. Consider the operator Pkih) = hk{hk+i) at /i € Let = {d {,G 

be a sequence of directions converging in norm to an arbitrary direction d G when £ ^ oo. 
For a sequence t£ ^0 and £ sufficiently large, we have 

d) = lim ^ [d'kihk + kdi, hk+i + - Pkihk, hk+i)] 

1^00 t£ 

= lim ^{[hk+t£di]{hk+i+t£di^.^) - hk{hk+i)) 

i^OO t£ 

= lim —(hk{hk-\-i + — hk{hk-\-i)) + 

1^00 t£ 

— dfc-|-i) T dk(hk+i). 

Consider now the operator Pk-i{h) = hk-i{hk{hk+i)) = hk-i{Pk{h)). By the chain rule for 
Hadamard directional derivatives we obtain 

!F'_i(h;d) = h'k_,{Pk{hyMh;d)) +dk-i{Mh)). 

In this way, we can recursively calculate the Hadamard directional derivatives of the operators 
(/i) hj (* • • h]^ (^fc+i) ***))* 

Pj{h-,d) = h'(lFj+i(/i);lF'+i(/i;d)) + dj{pj+i{h)), j = k,k - 1,... ,1. (9) 

Now the Delta Theorem [33) . relation ([7]), and the Hadamard directional differentiability of !F(-) at 
/ imply that 

- g{X)] = -P{f)] A P'{f,W). (10) 

The application of the recursive procedure ^ at h = f and d = W leads to formulae (l5|). The 
covariance structure ([6]) of IT follows directly from ([8]). □ 

We return to Example 12.11 and apply Theorem 12.21 

Example 2.3 (Semideviations continued). We have defined the mappings 

fi{rii) = E[X] + Kr]f = J fi{rji,x)P{dx), 

Mm) =E{[max{0,X-r/2}]^}, 
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and the constants 


^3 = E[X], /i2 = E{[max{0,X-E[X]}]P}, fii = Q{X). 

We assume that p > 1 and /2 C M is a compact interval containing the support of the random 
variable X. The interval Ii = [0, a] C M can be defined by choosing a so that a > \X — E(X)|P; for 
example a may be equal to the diameter of the support of X raised to power p. The space H is 
Ci{Ii) X ^ 2 ( 12 ) X E and we take a direction d gH. Following ([5]), we calculate 

6(c?) = 72 (^ 3 ; c^s) + d2{p3) = -pE{[max{0,X - P3}Y~^}d3 + d2{p3), 

6(rf) = /i(M2;6(rf)) +di{p2) = +di{p2)- 

We obtain the expression 

^i(W) = Wi(E{[max{0,X-E[X]}]^}) + 

^(^E{[max{0,X -E[X]}]^})~x 

(w 2(E[X]) -pE{[max{0,X-E[X]}]^"^}W3). (11) 

The covariance structure of the process W can be determined from ([6]). The process VFi(-) has the 
constant covariance function: 

cov[Wi(77'),Wi(7y")] 

= [ [fiiv',x) - flip')] [fi{p",x) - flip")] Pidx) = Var[X]. 

Jx 

It follows that VFi(-) has constant paths. The third coordinate, IF 3 has variance equal to Var[X]. 
It also follows from Q that cov[VFi(??), IF 3 ] = Var[X]. Therefore, VFi and IF 3 are, in fact, one 
normal random variable, which we denote by Vi- 

Observe that (|lip involves only the value of the process IF 2 at p 3 = E[X]. The variance of the 
random variable V 2 = VF 2 (E[X]) and its covariance with Vi can be calculated from @ in a similar 
way: 

Var[ 1 / 2 ] = e{ ([ max{0, X - E[W]}]^ - E( [max{0, X - E[X]}], 

COv[l/2, W] = 

e| [max{0, X - E[X]}]^ - E( [max{0, X - E[X]}]^)) (^ - E[X]) |. 

Formula (jllh becomes 

1 — p 

= 'Fi + ^(E{[max{0,X-E[X]}]^}) x 

(y2-pE{[max{0,X-E[X]}]^"^}yi). (12) 
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We conclude that 


- q ] ^Ar(0, cr^), 


where the variance ci^ can be calculated in a routine way as a variance of the right hand side of 
m, by substituting the expressions for variances and covariances of Wi, W 2 , and W 3 . ▲ 


Remark 2.4. Following Examvle \2.3l we could derive the limiting distribution of — q\ for 

p = 1 as well. However, the risk measure for p = 1 enjoys a simpler form and is already analysed 
in the literature (see, Section 6.5].) 


3 Estimation of Risk Measures Representable as Optimal Value 
of Composite Functional 

As an extension of the methods of section [ 2 l we consider the following general setting. Functions 
/i : X — >■ M, /2 : X R™ ^ R^, and a random vector X in R”^ are given. Our intention is 
to estimate the value of a composite risk functional 

Q = mmfi{z,E[f 2 iz,X)]). (13) 

where Z C R'^ is a nonempty compact set. 

We note that the compactness restriction is made for technical convenience and can be relaxed. 
Let Xi, ..., Xn be a random iid sample from the probability distribution P of X. We construct 
the empirical estimate 

fi (^z, i Ya=i f2{z, W)). 

Our intention is to analyze the asymptotic behavior of as n —>■ 00 . 

Following the method of section [2l we define the mapping ^ : Z x C{Z) — R as follows: 

= fi{z,h{z)). 

The space R'^ x C{Z) is equipped with the product norm of the euclidian norm on R*^ and the 
supremum norm on C{Z). We also define the functional v : C{Z) —)• R, 

v{h) = mm.<^{z,h). (14) 

z^Z 

Setting 

M2)=E|A(2,X)], 

we see that 


Q = v{h), 

^{n) ^v{h^'^ 1 ), n = l,2 .... 


Let Z denote for the set of optimal solutions of problem (|13p . 


Theorem 3.1. In addition to the general assumptions, suppose the following conditions are satis¬ 
fied: 

(i) The function f 2 {z, •) is measurable for all z G Z; 

(ii) The function fi{z,-) is differentiable for all z & Z, and both and its derivative with 

respect to the second argument, V/i(-,-), are continuous with respect to both arguments; 

(in) An integrable function 7 (-) exists such that 

11 / 2 ( 2 :',x) - / 2 ( 2 ;",x)|| < 7 (x)|| 2 ;' - z"\\ 

for all z', z” G Z and all x G T; moreover, f 7^(x) P{dx) < 00. 

Then 

Amin(V/i(z,E[/ 2 (z,X)]),lT(z)>, (15) 

z^Z 

where W(z) is a zero-mean Brownian process on Z with the covariance function 
cov [W{z'),W{z'')\ = 

[ {f2iz',x)-nf2iz’,X)]){f2iz",x)-E[hiz",X)]y Pidx). (16) 
J X 

Proof. Observe that assumptions (i)-(ii) of Theorem 12.21 are satisfied due to the compactness of the 
set Z and assumptions (ii)-(iii) of this theorem. Therefore, formula ([7]) holds: 

-h) 

The limiting process IT is a zero-mean Brownian process on Z with covariance function (1160 . 

Furthermore, due to assumption (ii), the function <P{-, h) is continuous. As the set Z is compact, 
problem (1141) has a nonempty solution set S{h). By virtue of [5l Theorem 4.13], the optimal value 
function u(-) is Hadamard-directionally differentiable at h in every direction d with 

v'{h;d)= min <!>'^{z,h)d, 

z&S{h) 

where ^'{z, h) is the Frechet derivative of ^{z, ■) at h. Therefore, we can apply the delta method 
([33]) to infer that 

y/n(v{h^'^^) — v(h)) min <I>Uz,h)W. 

z£S(h) 

Substituting the functional form of we obtain 

<l>'^{z,h) = Xh{z,E[f2{z,X)])5,, 

where 5^ is the Dirac measure at z. Application of this operator to the process W yields formula 
m- Observe that Wf) has continuous paths and the minimum exists. □ 

Corollary 3.2. If, in addition to conditions of Theorem \3.1l the set Z contains only one element 
z, then the following central limit formula holds: 

^<v/i(z,e[/2(z,x)]),if(i)), (i7) 

where W{z) is a zero-mean normal vector with the eovariance 

cov \W{z),W{z)\ = cov [f 2 iz,X),f 2 iz,X)]. 
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The following examples show that two notable categories of risk measures fall into the structure 

m 

Example 3.3 (Average Value at Risk). Average Value at Risk ([2|) is one of the most popular 
and most basic coherent measure of risk. Recall that for a random variable X, it is representable 
as follows: 

AVaR„(A) = min + -IE[(A - z)+] 
zgR ( a 

This measure fits in the structure (jl3l) by setting 

= z + -r] 
a 

f 2 {z, X) = max(0, X — z). 


The plug-in empirical estimators of ([2]) have the following form 


1 ” 

= min \z -\ -(max(0, Xi — z)) 

zeR I an ^ ^ > 

i=\ 


If the support of the distribution of X is bounded, then so is the support of all empirical distributions 
and we can assume that the Z contains the support of the distribution. Observe that all assumptions 
of Theorem ED are satisfied. If the distribution function of the random variable X is continuous 
at a, then the solution of the optimization problem at the right-hand side of ([2]) is unique. In that 
case, also the assumptions of Corollary 13.21 are satisfied. We conclude that 


— g] — ^E[max(0,X — 

where TV is a normal random variable with zero mean and variance 

Var[IT] = E ^ max(0, A — z) — E [ max(0, A — l] ^ 


2i 


We note that the assumption of bounded support of the random variable A is not really essential 
because, we could take a sufficiently large set Z, which would contain the corresponding quantile 
of the distribution function of A and all empirical quantiles for sufficiently large sample sizes. 

Additionally, we refer to another method for estimating the average value at risk at all levels 
simultaneously, which is discussed in [8], where also central limit formulae under different set of 
assumptions are established. ▲ 

Example 3.4 (Higher-order Inverse Risk Measures). Consider a higher order inverse risk 
measure (jT]) with c = ^ > 1: 

^[A] = min -|- c|| max(0, A — 2)||p|, (18) 


where p > 1 and || • ||p is the norm in the space. We define: 

1 

fi{z,y) = z + cyp, 
f 2 {z,x) = (max(0, X — z))^. 
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If the support of the distribution of X is bounded, so is the support of all empirical distributions. In 
this case, we can find a bounded set Z (albeit larger than the support of X) such that all solutions 
of problems (fTSD belong to this set. For p > 1 and c > 1 problem (fTBIl has a unique solution, which 
we denote by z. 

The plug-in empirical estimators of p8p have the following form 


1 ^ 1 


(19) 


i=l 


Observe that all assumptions of Theorem 13.11 and Corollary 13.21 are satisfied. We conclude that 


i-p 


^-(^E[(max(0,X-z))^]j ^ W, 
where VF is a normal random variable with zero mean and variance 


( 20 ) 


Var[IF] = E 


max 


(0, X — z))^ — E[( max(0, X — z))^ 


4 Estimation of Optimized Composite Risk Functionals 


In this section, we are concerned with optimization problems in which the objective function is a 
composite risk functional. Our goal is to establish a central limit formula for the optimal value of 
such problems. 

Our methods allow for the analysis of more complicated structures of optimized risk functionals: 


p = minE 
u&U 


/iU,E[/ 2 (n,E[---A(n,E[/fc+i(n,X)],X)] ••• ,X)],X 


( 21 ) 


Here X is a m-dimensional random vector, fj-.Ux x M™' ^ , j = 1,... ,k, with mo = 1 

and fk+i ■ U X M™’ —>■ M™''. We assume that U is a compact set in a finite dimensional space and 
the optimal solution u of this problem is unique. 

We define the functions: 


fj{u,r]j)= fj{u,r]j,x)P{dx), j = l,...,k, 
J X 

fk+i{u) = / fk+i{u,x)P{dx), 


lx 


and the quantities 


M/c+l /fc+l(^); 

We assume that compact sets Ii,...,Ik are selected so that int(/fc) D fk+i{U), and int(/j) D 
fj+i{U, Ij+i), j = 1,... ,k — 1. Let us define the space 

n = X h) X 40;i)(t/ xh)x .. X ^A:) x C™,(17), 
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where x Ij) is the space of -valued continuous functions on U x Ij, which are differ¬ 

entiable with respect to the second argument with continuous derivatives onU x Ij. We denote the 
Jacobian of fj{u,r]j,x) with respect to the second argument at tj* G Ij by fj{u,r]j,x). For every 
direction d gH, we define recursively the sequence of vectors: 


^jid) = [ fj{u,fij+i,x)^j+i{d)P{dx) + dj{nj+i), j = k,k - 1,... ,1. 


( 22 ) 


lx 


The empirical estimator is 


= mm 
ueu n 
1=1 


Tl Tl 71 Tl 

iZ ^ ^ V • • fk(u, ^ i[/fc+i(u,X)],X)] 

2=1 2=1 2 = 1 2=1 


••• ,x)],x 


We establish the following result. 


Theorem 4.1. Suppose the following conditions are satisfied: 

(i) J-^\\fj{u,r]j,x)\\‘^ P{dx) < oo for all rjj G Ij, u G U, j = l,...,k, and 

\\ fk+ilu,x)\\‘^P{dx) < oo for all u G U; 

(a) The functions fj{-, ■,x), j = 1,... ,k, and fk+i{-,x) are Lipschitz continuous for every xGX: 
\\fj{u',r]j,x) - fj{u'',r]'j,x)\\ < -/j{x){\\u' - u''\\ + \\r]j - r/"||), j = l,...,k. 
\\fk+i{u',x) - fk+i{u”,x)\\ < 7 fc+i(x)||u' - u"\\, 


for all 'nj,rij G Ij, u!, v!' G U; moreover, f jj(x) P(dx) < oo, j = 1,... , k P 1; 

(Hi) The functions fj{u, ■,x), j = 1,... ,k, are continuously differentiable for every x G X, u G U ; 
moreover, their derivatives are eontinuous with respect to the first two arguments. 

T~'h PTi 

,/fi[gi-)-g] 

where Wf) = (iFi(-),..., Wkf), Wk+i^ is a zero-mean Brownian process on I = Ii x I 2 x ■ ■ ■ x Ik. 
Here Wj(-) is a Brownian proeess of dimension nij-i on Ij, j = l,...,k, and Wk+i is an mk- 
dimensional normal vector. The eovarianee function ofWf) has the following form 


cov [Wi{r]i),Wj{r]j)] = 

/ [fi{u,r]i,x) - fiiu,rji)] [fj{u,r]j,x) - fj{u,r]j)]~^ P{dx) 
Jx 

Pi G h, pj G Ij, i,j = I,..., k 
cov [Wi{pi),Wk+i] = 

/ [fi(.u,pi,x) - fi{u, Pi)] [fk+i{u,x) - fk+i{u)]~^ P{dx), 
Jx 

PiGli, i = l,...,k 

cov \Wk+i,Wk+i] = 

[fk+i{u,x) - fk+i{u)] [fk+i{u,x) - fk+i{u)] P{dx). 


IX 


( 23 ) 
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Proof. We follow the main line of argument of the proof of Theorem l2.2l We define M = niQ + nii + 
• • • + TTifc and the vector-valued function f : U x I x A! ^ with block coordinates fj{u,7]j,x), 
j = 1,... ,k, and fk+i{u, x). Similarly, we define f : U x I ^ with block coordinates fj{u, r]j), 
j = 1,... ,k, and fk+i{u). Consider the empirical estimates of the function f{u, r]): 

n 

hy^\u,ri) = - '^f{u,r],Xi), n = 1,2 ,... . 

i=l 

Due to our assumptions, for sufficiently large n all these functions are elements of the space T-L. 

Owing to assumptions (i)-(ii), the class of functions f{u,r], ■), u G U, r] & I, is Donsker, that is 
the following uniform Central Limit Theorem holds (see |38l Ex. 19.7]): 

-f)^w, (24) 

where W is a zero-mean Brownian process on 17 x / with covariance function 

cov [W{u',r]'),W{u",r]")] = 

[ [fiu', v', x) - f{u', r]')] [fiu", rj", x) - f{u'\ r/")] P{dx). (25) 
Jx 

This fact will allow us to establish asymptotic properties of the sequence We define an 

operator T : 7^ —M as follows 

'P{u,h) = hi(^u,h2{u, ■ ■ ■ hk{u, hk+i{u)) •••)y 


By definition, 


e{X) = ™n<l^(M,/), 

(u, . 

ueu ^ ’ 

To apply Delta Theorem to the sequence | |, we have to verify Hadamard directional differen¬ 

tiability of the optimal value function u(-) = min„g[/^^(ri, •) at /. Observe that our assumptions 
imply that the conditions of 0 Thm. 4.13] are satisfied. As the optimal solution set is a singleton, 
the function u(-) is differentiable at / with the Frechet derivative 

where ^'{u,f) is the Frechet derivative of 'P{u,-) at /. The remaining derivations are identical as 
those in the proof of Theorem 12.21 We only need substitute u as an additional argument of all 
functions involved. □ 

Example 4.2 (Optimization problems with mean semideviation). Consider now an opti¬ 
mization problem involving a mean-semideviation measure of risk 

jL 

ijhn£»[vp(n, A)] =E[v9(u,A)] + K(E[{ip{u, X) - E[vp(n, A)])^] ^, (26) 
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where y? : x Af —>■ M. We have 


fi{7]i,u,x) = KT]^ +ip{u,x), 
f 2 {V 2 ,u,x) = {[max{0, (/?(«, x) -m}Y}, 
f 3 {u,x) = ip{u,x), 

and 

fi{Vi,u) = Kr]l +E[v9(n,X)], 

f 2 im,u) = E{[max{0,(/?(n,X) - 
h{u)=E[ipiu,X)]. 

We assume that p > 1. Suppose u is the unique solution of problem ([261) . We set = E[(/9(u,X)]. 
Then ^2 = E{ [max{0, X) — E[(/?(-u,X)]}]^} and = g(X). Following (1221) . we calculate 

b(d) = f2(p3,u;d3} + d2(p3) = -pE{ [max{0,99(u,X) - ^3}]^ + ^2(^3), 

Ci(d) = f[{p2,u;^2{d)) + di{p2) = '^ 1^2 ^^2{d) + di{p2)- 
We obtain the expression 

^[{f; W) = Wi(E{[max{0,99(u,X) -E[99 (u,X)]}]^}) + 

1 —p 

— ^E{ [ max{0, (/?(«, X) — E[99 (-u, X)]}]^}^ ’’ x 
(w2{E[ip{u,X)]) -pE{[max{0,ip{u,X)-E[ip{u,X)]}Y-YW3). (27) 

The covariance structure of the process W can be determined from (j25[) . similar to Example 12.81 
The process hFi(-) has the constant covariance function: 

cov[VFi(??i(u)), VFi(r/i(ti))] = Var[(^(n, X)]. 

The third coordinate, VF3 has variance equal to Xdx[ip{u,Xy\. Also, 

cov(VFi(r/i(u)),lF3) =\ax[(p{u,X)], 

and thus VFi and TT3 have the same normal distribution and are perfectly correlated. 

The variance function of tT2(-) and its covariance with Wi (and 1^3) can be calculated in a 
similar way: 


Var[W 2 (E[ 99 (u, X)])] = e{ ([ max{0, p{u, X) - E[(y 9 (u, X)]}] 

E([max{0,(/?(u,X) - E[v 9 (u, X)]}]^)^ {y{u,X) - E[vp(n,X)]^ 


We conclude that 


where the variance ci^ can be calculated in a routine way as a variance of the right hand side of 
(f271) . by substituting the expressions for variances and covariances of ITi, IT 2 , and W 3 . ▲ 
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5 A simulation study 


In this section we illustrate the convergence of some estimators discussed in this paper to the limiting 
normal distribution. Many previously known results for the case p = 1 have been investigated 
thoroughly in the literature (see, e.g., [35]) and we will not dwell upon these here. We will only 
illustrate the case about Higher-order Inverse Risk Measures as discussed in Example 4 for the case 
p > 1. More specifically, we take independent identically distributed observations Xi, i = 1,2,... ,n 
from an independent identically distributed X ~ AA(0,3) observations. We take e = 0.05 and p = 2. 
In that case c = 20. Numerical calculation in Matlab delivers the theoretical argument minimum 
z* = 14.5048 and the value of the risk in (|18p being = 15.5163. The standard deviation of the 
random variable in the right hand side of ()20p is 16.032. The plug-in estimator of this risk can 
be represented as a solution of a convex optimization problem with convex constraints and hence 
a unique solution can be found by any package that solves such type of problems. We have used 
the cvx package that can be operated within matlab. Denoting di = max(Xj — z,0),i = 1,2,... ,n 
and putting all dj, f = 1, 2 ,... , n in a vector d we can rewrite our optimization problem as follows: 


min I 

z,d ^ 


1 

^n^/p 


(^dpy/p+z] 

i=l 


subject to Xi — z < di, di >0, i = 1,2,.. 


n. 


(28) 


The numerical solution to this optimization problem gives us the estimator To get an idea 
about the speed of convergence to the limiting distribution in (I19p we simulate m = 2500 risk 

in) 

estimators q^- ,j = l,2,..., 2500 for a given sample size n and draw their histogram. The number 
of bins for the histogram is determined by the rough “squared root of the sample size” rule. This 
histogram is superimposed to the AA(15.5163, (16.032/^/71)^) density. As n is increased, our theory 
suggests that the histogram and the normal density graph will look more and more similar in shape. 
Their closeness indicates how quickly the central limit theorem pops up in this case. 

Figure [T] shows that the central limit theorem indeed represents a very good approximation 
which improves significantly with increasing sample size. The small downward bias that appears in 
Figure 1 a) is getting increasingly irrelevant with growing sample size. We have experimented with 
different values of p such as p = 1,1.5, 2 and 2.5 and we have also changed the value of e (respectively 
c = 1/e). The tendency shown in Figured] is largely upheld, however, as expected, the standard 
errors are increased when c and/or p is increased. Also, the limiting normal approximation seems to 
be more accurate for the same sample sizes when a smaller value of p is used. This discussed effect 
is illustrated on Figure [3| where p = 1 (i.e., the case of AVaR), p = 1.5, p = 2 (where a different 
sample in comparison to the sample in Figured!) and p = 2.5 was simulated). The remaining 
quantities have been kept fixed to n = 2000 and c = 20. We stress that increasing the sample size 
in Figure [3| d) makes the histogram look much more like the limiting normal curve so that the 
discrepancy observed there is indeed just due to the limiting approximation popping up at larger 
samples when p is increased. 

We also experimented with different distributions for the random variable X. We took specif¬ 
ically t-distributions with degrees of freedom v such as 4, 6 , 8 and 60, shifted to have the same 
mean of 10 like in the normal simulated data. The results of this comparison for p = 2, e = 0.05 
and n = 4000 are shown in Figure 2. The variances of the f-distributed variables, being equal to 
vjiy — 2), are finite and even smaller than the variance of the normal random variable in Figure 
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Approximation for n=1000. m=2500. p=2, c=20 
1.4|-1-1-1-1-1-1- 

1.2 - 



14 14.5 15 15.5 16 16.5 17 


Approximation for n=2000, m=2500, p=2, c=20 



14.5 15 15.5 16 16.5 


(a) n = 1000 (b) n = 2000 




(c) n = 4000 (d) n = 8000 

Figure 1: Density histogram of the distribution of the estimator Qn for increasing values of n and 
its normal approximation using Theorem 2 and X ~ AA(10,3). 
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X: t distributed with 8 df, n=4000, fn=2500, p=2, c=20 



13 13.5 14 14.5 15 15.5 16 16.5 


(a) df = 60 (b) df = 8 




(c) df = 6 (d) d/ = 4 

Figure 2; Density histogram of the distribution of the estimator for n = 4000 and X ^ t^, with 
V being 60, 8, 6 and f. 
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(a) p = 1 


(b) p = 1.5 



(c) p — 2 (d) p = 2.5 

Figure 3: Density histogram of the distribution of the estimator for different values of p when 
X ~ AA(10,3). 
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[TJ However the heavier tails of the t distribution adversely affect the quality of the approximation. 
Despite the fact that the limiting distribution of the risk estimator is still normal when ly = 6 
and u = 8, the heavy tailed data cause the normal approximation to be relatively poor even at 
n = 4000. The case i/ = 60 is closer to normal distribution and hence the approximation works 
better in this case. 

Note that the limiting distribution when p = 2 involves the fourth moment of the t distribution 
and this moment is finite for v = 6,8 and 60 but is infinite when = 4. As a result, it can be 
seen from Figure [2] d) that the normal approximation collapses in this case. Also, Figure [2] shows 
that for attaining similar quality in Kolmogorov metric for the asymptotic approximation like in 
the case of normally distributed X, in Figure 1 c), much bigger samples are needed. For the fixed 
sample size of 4000, the quality of the normal approximation worsens as i' decreases from 60 to 
8 and then to 6. Furthermore, and outside of the scope of the present paper, we note that if the 
distribution of X has even heavier tails than the t distribution with (for example, if it is in the class 
of stable distributions with stability parameter in the range (1,2)) then the limiting distribution of 
the risk may not be normal at all. 
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6 Conclusions 


The infinity dimensional delta method is a standard statistical technique to evaluate the asymptotic 
distribution of estimators of statistical functionals. The applicability of the procedure hinges on 
veryfing smoothness conditions of the related functionals. Motivated primarily by the need to 
estimate coherent risk measures we introduce a general composite structure for such functionals in 
in which all known coherent risk measures can be cast. The potential applicability of our central 
limit theorems however extends beyond functionals representing coherent risk measures. Our short 
simulation study indicates that the central limit theorem-type approximations are very accurate 
when the sample size is large, p is in reasonable limits between 1 and 3 and the distribution of X 
is with not too heavy tails. We note that for smaller sample sizes, the technique of concentration 
inequalities may be more powerful and accurate when evaluating the closeness of the approximation. 
It is possible to derive concentration inequalities for estimators of statistical functionals with the 
structure that has been introduced in our paper. This is a subject of ongoing research. 
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